
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Jyutping Romanization: Parsing and Conversion &#8212; PyCantonese 2.2.0 documentation</title>
    <link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Search Queries" href="searches.html" />
    <link rel="prev" title="Corpus Reader Methods" href="reader.html" /> 
  </head><body>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="searches.html" title="Search Queries"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="reader.html" title="Corpus Reader Methods"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">PyCantonese 2.2.0 documentation</a> &#187;</li> 
      </ul>
    </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Jyutping Romanization: Parsing and Conversion</a><ul>
<li><a class="reference internal" href="#parsing-jyutping-strings">Parsing Jyutping Strings</a></li>
<li><a class="reference internal" href="#jyutping-to-yale-conversion">Jyutping-to-Yale Conversion</a></li>
<li><a class="reference internal" href="#jyutping-to-tipa-conversion">Jyutping-to-TIPA Conversion</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="reader.html"
                        title="previous chapter">Corpus Reader Methods</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="searches.html"
                        title="next chapter">Search Queries</a></p>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    </div>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="jyutping-romanization-parsing-and-conversion">
<h1>Jyutping Romanization: Parsing and Conversion<a class="headerlink" href="#jyutping-romanization-parsing-and-conversion" title="Permalink to this headline">¶</a></h1>
<p>Among the most common tasks in handling Cantonese corpus data are those that
involve the processing of <a class="reference external" href="http://lshk.org/node/47">Jyutping romanization</a>, e.g., searching for words by a
particular Jyutping element (an onset, a tone, or something else).
It is necessary,
therefore, to parse a string of Jyutping romanization into
its phonological components. Moreover, several other Cantonese romanization
schemes are actively used alongside Jyutping. PyCantonese provides
tools for converting Jyutping to some of these romanization systems.</p>
<div class="section" id="parsing-jyutping-strings">
<h2>Parsing Jyutping Strings<a class="headerlink" href="#parsing-jyutping-strings" title="Permalink to this headline">¶</a></h2>
<p>The function <code class="docutils literal notranslate"><span class="pre">parse_jyutping()</span></code> parses a string of Jyutping romanization
and returns a list of tuples; the string may contain romanization for multiple
Chinese characters. The parsed romanization for a character is a 4-tuple:
(onset, nucleus, coda, tone):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="kn">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">parse_jyutping</span><span class="p">(</span><span class="s1">&#39;hou2&#39;</span><span class="p">)</span>  <span class="c1"># 好</span>
<span class="go">[(&#39;h&#39;, &#39;o&#39;, &#39;u&#39;, &#39;2&#39;)]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">parse_jyutping</span><span class="p">(</span><span class="s1">&#39;gwong2dung1waa2&#39;</span><span class="p">)</span>  <span class="c1"># 廣東話</span>
<span class="go">[(&#39;gw&#39;, &#39;o&#39;, &#39;ng&#39;, &#39;2&#39;), (&#39;d&#39;, &#39;u&#39;, &#39;ng&#39;, &#39;1&#39;), (&#39;w&#39;, &#39;aa&#39;, &#39;&#39;, &#39;2&#39;)]</span>
</pre></div>
</div>
<p>Syllabic nasals are treated as nuclei:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="kn">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">parse_jyutping</span><span class="p">(</span><span class="s1">&#39;m4goi1&#39;</span><span class="p">)</span>  <span class="c1"># 唔該</span>
<span class="go">[(&#39;&#39;, &#39;m&#39;, &#39;&#39;, &#39;4&#39;), (&#39;g&#39;, &#39;o&#39;, &#39;i&#39;, &#39;1&#39;)]</span>
</pre></div>
</div>
<p>The function <code class="docutils literal notranslate"><span class="pre">jyutping()</span></code> is able to detect invalid Jyutping romanization:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="kn">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">parse_jyutping</span><span class="p">(</span><span class="s1">&#39;hou7&#39;</span><span class="p">)</span>
<span class="gt">Traceback (most recent call last):</span>
  File <span class="nb">&quot;&lt;stdin&gt;&quot;</span>, line <span class="m">1</span>, in <span class="n">&lt;module&gt;</span>
  File <span class="nb">&quot;/usr/local/lib/python3.4/dist-packages/pycantonese/jyutping.py&quot;</span>, line <span class="m">197</span>, in <span class="n">parse_jyutping</span>
    <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">&#39;tone error -- &#39;</span> <span class="o">+</span> <span class="nb">repr</span><span class="p">(</span><span class="n">jp</span><span class="p">))</span>
<span class="gr">ValueError</span>: <span class="n">tone error -- &#39;hou7&#39;</span>
</pre></div>
</div>
</div>
<div class="section" id="jyutping-to-yale-conversion">
<h2>Jyutping-to-Yale Conversion<a class="headerlink" href="#jyutping-to-yale-conversion" title="Permalink to this headline">¶</a></h2>
<p>The Yale romanization is still a commonly used system, particularly in numerous
dictionaries and
Cantonese language teaching resources. PyCantonese provides the
<code class="docutils literal notranslate"><span class="pre">jyutping2yale()</span></code>
function which reads a valid Jyutping string and returns the Yale equivalent:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="kn">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">jyutping2yale</span><span class="p">(</span><span class="s1">&#39;m4goi1&#39;</span><span class="p">)</span>
<span class="go">&#39;m̀hgōi&#39;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">jyutping2yale</span><span class="p">(</span><span class="s1">&#39;gwong2dung1waa2&#39;</span><span class="p">)</span>
<span class="go">&#39;gwóngdūngwá&#39;</span>
</pre></div>
</div>
<p>In cases of potential ambiguity where a consonant letter could be part of
the syllable on the left or the right,
the quote <code class="docutils literal notranslate"><span class="pre">'</span></code> is used as a separator:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">jyutping2yale</span><span class="p">(</span><span class="s1">&#39;hei3hau6&#39;</span><span class="p">)</span>  <span class="c1"># 氣候; Yale &quot;h&quot; in 2nd syllable onset w/o separator would be ambiguous</span>
<span class="go">&quot;hei&#39;hauh&quot;</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">jyutping2yale()</span></code> has the optional parameter <code class="docutils literal notranslate"><span class="pre">as_list</span></code> for returning a list
of Yale strings instead:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">jyutping2yale</span><span class="p">(</span><span class="s1">&#39;gwong2dung1waa2&#39;</span><span class="p">,</span> <span class="n">as_list</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="go">[&#39;gwóng&#39;, &#39;dūng&#39;, &#39;wá&#39;]</span>
</pre></div>
</div>
</div>
<div class="section" id="jyutping-to-tipa-conversion">
<h2>Jyutping-to-TIPA Conversion<a class="headerlink" href="#jyutping-to-tipa-conversion" title="Permalink to this headline">¶</a></h2>
<p>PyCantonese also offers the <code class="docutils literal notranslate"><span class="pre">jyutping2tipa()</span></code> function for the
<a class="reference external" href="https://www.ctan.org/pkg/tipa?lang=en">LaTeX TIPA</a> users:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pycantonese</span> <span class="k">as</span> <span class="nn">pc</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">jyutping2tipa</span><span class="p">(</span><span class="s1">&#39;m4goi1&#39;</span><span class="p">)</span>
<span class="go">[&#39;\\s{m}21&#39;, &#39;kOY55&#39;]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">pc</span><span class="o">.</span><span class="n">jyutping2tipa</span><span class="p">(</span><span class="s1">&#39;gwong2dung1waa2&#39;</span><span class="p">)</span>
<span class="go">[&#39;k\\super w ON25&#39;, &#39;tUN55&#39;, &#39;wa25&#39;]</span>
</pre></div>
</div>
<p>Currently, tones are output as Chao tone letters (= the numbers from 1 to 5)
directly suffixed to the individual syllable string.
(This may change in a future
release if this behavior proves to be inconvenient.)</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="searches.html" title="Search Queries"
             >next</a> |</li>
        <li class="right" >
          <a href="reader.html" title="Corpus Reader Methods"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">PyCantonese 2.2.0 documentation</a> &#187;</li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &#169; Copyright 2014-2018, Jackson L. Lee | Documentation last updated on June 30, 2018.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.7.5.
    </div>
  </body>
</html>